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ON THE NUMBER OF SUPPORT POINTS OF MAXIMIN AND 
BAYESIAN OPTIMAL DESIGNS 1 

By Dietrich Braess and Holger Dette 
Ruhr-Universitdt Bochum 

We consider maximin and Bayesian D-optimal designs for nonlin- 
ear regression models. The maximin criterion requires the specifica- 
tion of a region for the nonlinear parameters in the model, while the 
Bayesian optimality criterion assumes that a prior for these parame- 
ters is available. On interval parameter spaces, it was observed empir- 
ically by many authors that an increase of uncertainty in the prior in- 
formation (i.e., a larger range for the parameter space in the maximin 
criterion or a larger variance of the prior in the Bayesian criterion) 
yields a larger number of support points of the corresponding opti- 
mal designs. In this paper, we present analytic tools which are used to 
prove this phenomenon in concrete situations. The proposed method- 
ology can be used to explain many empirically observed results in the 
literature. Moreover, it explains why maximin D-optimal designs are 
usually supported at more points than Bayesian D-optimal designs. 

1. Optimal designs for nonlinear models. Consider the common problem 
of nonlinear experimental design where the scalar response variable, say Y, 
is distributed as a member of the exponential family with 

(1-1) E[Y\x]=r ] (x,9), 

6 £ R m is the unknown parameter, x denotes the explanatory variable that 
varies in a compact space, say X, and r] is a given function. We assume that 
observations under different experimental conditions are independent and 
denote the Fisher information matrix for the parameter 9 at the point x by 
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see [7]. Throughout this paper, we assume the continuous differentiability 
of the function ij with respect to 9 and the existence of the conditional 
variance. 

An approximate design £ for this model is a probability measure on the 
design space X with finite support xi,...,x n and weights wi,...,w n rep- 
resenting the relative proportions of total observations taken at the corre- 
sponding design points; see, for example, [11]. The information matrix of a 
design £ is defined by 



and a local optimal design maximizes a given function of the matrix M(£, 9); 
see [18]. We consider the D-optimality criterion log 0)| for discrimi- 

nating among competing designs. In general, such a design depends on the 
unknown parameter 9, which must be specified for its implementation. Lo- 
cal optimality criteria have been criticized by numerous authors because 
the resulting optimal designs can be highly inefficient within the true model 
setting if the unknown parameters are misspecified. 

A more robust approach has been achieved in practice by using the con- 
cepts of Bayesian and maximin optimality, since additional information on 
the uncertainty in those parameters can be incorporated. A priori knowledge 
of the experimenter can be modeled mathematically as follows. Assume that 
9 6 Q, where G C M. m denotes a set and let ir denote a probability measure 
on O. A design is called Bayesian D- optimal (with respect to a given prior 
7r on 0) if it maximizes the function 



Bayesian D-optimal designs have been studied by Chaloner and Larntz [2], 
Pronzato and Walter [17], Mukhopadhyay and Haines [15], Dette and Neuge- 
bauer [5, 6] and many others. 

In some circumstances, it is difficult for the experimenter to specify a 
prior on the parameter space O. Therefore, several authors have proposed 
standardized maximin D-optimal designs, that is, designs which maximize 



where £[9] denotes the local D-optimal design for fixed 9; see, for example, 
[3, 9, 16] or [4]. The criterion (1.5) does not compare the quantities |M(£, 0)| 
directly, but rather with respect to the values that could be obtained if 9, 
and, as a consequence, the local D-optimal design, were known. Bayesian 
and standardized maximin D-optimal designs can only be given explicitly 
in rare circumstances. Moreover, optimal designs often require more than m 



(1.3) 




(1.4) 




(1.5) 
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support points unless the parameter space G is "sufficiently narrow" or the 
prior in the Bayesian criterion puts "most of its mass at a small" subset of G. 
For interval parameter spaces, it was observed empirically that the number 
of support points increases if less knowledge about 9 is incorporated in the 
optimality criteria (see [2] and [4], among others). 

In the present paper, we will provide analytic tools for making a rigorous 
decision as to whether the number of support points is unbounded if the a 
priori information on an interval parameter space is repeatedly diminished. 
It turns out that the answer to this question depends in a complicated way 
on the given nonlinear model. Therefore, we will mainly concentrate on the 
case where only one parameter, say /3, enters nonlinearly in the optimality 
criterion and will mention possible extensions to the general case briefly in 
the Appendix. For the maximin criterion, we quantify uncertainty by consid- 
ering an interval with increasing length, while for the Bayesian case, we use 
the properties of the prior — this includes a uniform prior on an interval with 
increasing length. We establish sufficient conditions on the nonlinear model 
such that increasing uncertainty about the nonlinear parameter leads to an 
arbitrarily large number of support points of Bayesian and standardized 
maximin D-optimal designs. In fact, the tools in this paper are applicable 
to all models known to us. 

The conditions are more restrictive for the Bayesian D-optimal design 
than for the standardized maximin D-optimal one. This explains why stan- 
dardized maximin D-optimal designs are usually supported at more points 
than Bayesian D-optimal designs. In particular, in the case of Bayesian D- 
optimality, the number of support points may increase so slowly that it is 
almost impossible to decide by numerical computations whether the number 
is asymptotically bounded or not. 

In Section 3 we discuss standardized maximin optimal designs, while Sec- 
tion 4 deals with the Bayesian case. We illustrate our approach for models 
with one, two and three parameters. The proposed methodology is a gen- 
eral one, but the technical difficulties for the verification of the conditions 
differ in each scenario and increase with the dimension. Finally, some con- 
clusions are given in Section 5, while all technical details are deferred to 
the Appendix. For further examples, see the technical report of Braess and 
Dette [1]. 

2. Preliminaries. For the sake of simplicity, we assume that the local 
D-optimal design depends only on one component of the parameter 6 £ M. m . 
The general situation can be obtained by a straightforward generalization 
which is briefly indicated in Appendix A. 3. We denote this component by (3 
and the corresponding design by Consequently, we will reflect only this 
dependence in our notation and the optimality criteria in (1.4) and (1.5) are 
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represented by the functions 
(2.1) / log\M(^P)\7r(d(3), 



B 



(2-2) m = min{ ^hf ] L 



m<mM 

respectively. Here, M(£,/3) is the information matrix (1.3) in the nonlinear 
model, B = [/3 m im /?max] represents the prior knowledge about the location of 
the unknown parameter (3 and ir denotes a prior on B. 

Let £ be a design on X with masses w k at the support points x k (k = 
1, . . . , n). The information matrix of £ is then given by 

n 

M&P) = Y,w k I{x k rf). 

k=l 

Throughout this paper, 

(2 3) Q(B 8) - ^MMJ 

quantifies the loss of information if (5 is the "true" unknown parameter, while 
the experimenter uses the local D-optimal design for the (wrong) guess (5. 
Note that (Q(/3, /3)) 1//m is the D-efficiency of the design £[/?]. We will derive 
sufficient conditions such that the number of support points of the optimal 
designs with respect to the criteria (2.1) and (2.2) exceeds any given number 
if the amount of prior information is decreased. 

Our first definition quantifies the loss of efficiency caused by an application 
of a local D-optimal design based on a misspecified parameter. 



Definition 2.1. Let l:B— >IR be a nondecreasing continuous function. 
The function Q defined in (2.3) is said to be uniformly decreasing with respect 
to the scale £ if the following two conditions hold: 

(i) for all (3,/3 G B, the inequality 

(2.4) Q((3j)<<p(£({3)-m) 

holds, where tp is a real-valued function whose decay [i.e., <p(z) — » 0] for z — > 
oo will be sufficiently fast, as specified later for each case under consideration; 

(ii) there is a positive constant A > such that 

(2.5) Q{(3, p) > \ whenever \l(J3) - £0) | < A. 

There is some heuristic explanation for the two conditions in Definition 
2.1. The quantity 



(2.6) 



d(j3j) = \m-m\ 
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can be considered a distance between the parameters [3 and (3. This interpre- 
tation is also useful for the extension to models with more than one nonlinear 
parameter. On one hand, condition (2.4) requires that the efficiency decrease 
sufficiently fast if the parameter is misspecified. On the other hand, condition 
(2.5) guarantees that the efficiency cannot become small if the parameter is 
only slightly misspecified. Since these conditions are very natural, they are 
satisfied by most of the commonly used nonlinear models with £{(3) = (3 or 
£((3) = log/3. In fact, we are not aware of any model where conditions (2.4) 
and (2.5) are not satisfied. 

A final assumption is required for our main results. Roughly speaking, it 
guarantees that points in the design space X which are not in the support 
of any local D-optimal design £[/?] can be disregarded for the construction 
of the optimal designs. To be precise, we represent the Fisher information 
as 



(2.7) 



I(x 1 ,...,x m ,(3) = f(x,(3)f T (x,f3), 



where f(x,p) = (Ji(x,P),. 
X ', the determinant 



f m (x, (3)) € M. m , and introduce, for x\, . . . , x m € 



(21 



uP) 



fm(xi,P) 



fi(x m ,P) 



fm (•''rai P') 



We assume that there exists a constant, say Co, such that for any x = (xi, . . . , 
x m) T £ X m , there exist local D-optimal designs C^ 1 )], . . . , £[/?( m )] with 



(2.9) 



(3^ e B, such that 

m 

\I m (xi,---,x m ,p)\<c J2\M(t[P U) },P)\ 



VPeB. 



Additionally, we define m v > as the number of points which appear as 
support points of any local D-optimal design in the nonlinear model and 
we assume m > m^. For example, if rj(x,9) = 6o + 6ie~ e2X and x £ [0, 1], it 
follows from [8] that any local D-optimal design has three support points, 
including the points and 1. For this model, we have m = 3 and m„ = 2. 

Finally, we note that assumption (2.9) is obviously satisfied in models 
where the local D-optimal designs are supported at a minimal number of m 
points. In examples where the local D-optimal designs are supported at more 
than m points, the condition has to be checked in each situation. However, 
thus far, the authors have not found a case where (2.9) is not satisfied. 



3. Standardized maximin D-optimal designs. The following result shows 
for nonlinear models that the number of support points of the standardized 
maximin Z)-optimal design can become arbitrarily large under the assump- 
tions stated in Section 2. 
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Theorem 3.1. Assume that Q is uniformly decreasing with respect to 
the scale i in the sense of Definition 2.1, where 

(3.1) (p(z) < ci|2;| _7 withc\>0, 7 > m, 

and that (2.9) is satisfied. Let N £ N be given. If ^(/3 max ) — £{Pmm) is suffi- 
ciently large, then the standardized maximin D- optimal design with respect 
to the interval B= [(5 m \ n , /3 m ax] *s supported at more than N points. 

Moreover, if the local D-optimal designs are minimally supported and the 
support contains only m points, then the condition 7 > m in (3.1) can be 
replaced by 7 > m — m v . 

The technical proof is deferred to the Appendix. The main idea is to 
use condition (3.1) to show that the value of the standardized maximin D- 
optimality criterion of any iV-point design can be estimated by 0(B~" f ), 
where B = £(/? max ) — l( An in). In a second-step condition, (2.5) is used to 
construct a design with n> N support points, for which the value of the 
criterion is at least of the order 0(B~ rn ). Since 7 > m, it follows, for suffi- 
ciently large B, that no design with TV support points can be standardized 
maximin D-optimal. 

In the following we will illustrate the application of Theorem 3.1 in the 
cases m = 1,2,3. The technical difficulties for the verification of the sufficient 
conditions increase with the dimension of the Fisher information. 



Example 3.2. Consider the one-dimensional exponential growth model 
with normally distributed homoscedastic errors and 

(3.2) r l (x,f3)=e-> 3x , /3e[l,J3],xe[0,l]. 

The Fisher information of the parameter (3 (up to a constant which does not 
affect the optimal design problem) is 

(3.3) I{x,P) = x 2 e- 2f3x . 

The local D-optimal design is a one-point design supported at the point 
x[P] = 1/P and it follows from 

(3.4) M(^[/3],/3)=/(x[/3],/3) = (e/3)- 2 



that the function Q in (2.3) is given by 

(3-5) Q(/3,/3)=(|e 1 -^) 



Hence, Q((3,P) = V>(|), where 



2„,2 



e z y, ify<l, 
3y~ 2 , if y > 1- 
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Table 1 

Standardized maximin D-optimal designs for the 
exponential regression model (3.2) on the interval [0,1] 
with respect to various parameter spaces [1,B]. First 
row: support points; second row: weights 



B 


10 


40 


50 


100 


200 


X\ 


0.142 


0.037 


0.028 


0.014 


0.007 


Wi 


0.553 


0.414 


0.379 


0.336 


0.306 


■r-2 


0.771 


0.193 


0.131 


0.064 


0.034 


« ! 12 


0.447 


0.272 


0.221 


0.193 


0.182 


X3 




0.772 


0.374 


0.156 


0.101 


W3 




0.314 


0.170 


0.093 


0.147 


•f 4 






0.972 


0.287 


0.250 


W± 






0.230 


0.137 


0.089 


Xb 








0.838 


0.326 


Wb 








0.241 


0.066 


X6 










0.856 


We 










0.210 



We choose £(/3) = log f3 and (2.4) holds with 
(3.6) tp(z) <eV 2|z| . 

The decay is even faster than required in (3.1). Moreover, ip( z ) > \ if \ ^ z ^ 
2, which proves property (ii) in Definition 2.1. Finally, we verify property 
(2.9) for m = 1. Consider a point, say xq. If 1 > xo > 1/B, then we have 
5 XQ =^[1/xo] for the Dirac measure at the point xq and there is, in fact, 
equality in (2.9) with (3 = 1/xq. On the other hand, if x$ < 1/B, then the 
Dirac measure 5 XQ is not local D-optimal for any (3 € [1, B] and xo(3 < ^ < 1 
for all (3 G [1, B]. Since the function z i— >• z 2 e~ 2z is increasing on the interval 
[0,1], it follows that 

p 2 h(x ,(3) = P 2 M(5 X0 ,f3) 

< I3 2 M(5 1/B ,P)=P 2 M^[B},(3) for all j3 € [1, B], 

which shows (2.9). Similarly, if xq > 1, we obtain (3 = 1. Therefore, condi- 
tion (2.9) is satisfied with 

(3 = max{l, min{i?, 1/xo}}- 

By Theorem 3.1, the number of support points of the standardized maximin 
D-optimal design for the regression model (3.2) becomes arbitrarily large 
with increasing parameter B — > oo. 
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Numerical results in Table 1 illustrate this fact. We have calculated the 
standardized maximin D-optimal designs using Matlab for various parame- 
ter spaces B = [1,-B]. The optimality of the calculated designs was checked 
by the equivalence theorem of Wong [19]. 



Example 3.3. Consider the exponential growth model with normally 
distributed homoscedastic errors and 

(3.7) rj(x,a,P)=a + e-^ x , S e [0,1], E [1, Ana*], 

which is used for analyzing the growth of crops; see [13] . The Fisher infor- 
mation matrix (up to a constant which does not change the optimal design 
problem) for the parameter 6 = (a, (3) is given by 

T(r ff\-( 1 -xe-P* 

1 [X, p) I _ xe -(3x x 2 e -2f3x 

and the function I2 defined in (2.8) is determined as 

h(xi,x 2 ,(3) = ( Xl e-^-x 2 e-^) 2 . 

If xi < X2, we can always increase Ii{x\, x 2, (3) by using x\ = 0. It follows 
from [6] that the local D-optimal design puts equal masses at the points 
x\ = and x 2 = 4 with corresponding determinant 



4(e/?) 2 ' 

see also [8]. Therefore, we obtain m = 2, m v = 1, m — = 1 for the quan- 
tities in the second part of Theorem 3.1. From ^(0, 1/(5,(3) = (5~ 2 e~ 2 ^^ ', it 
follows that Q((3,/3) = ijj((3//3) with the same function ^{z) = z 2 e 2<yl ~ z ^ as in 
Example 3.2. This shows that (2.4) and (3.1) are satisfied. Obviously, 

h(xi,x 2 ,P) < (xie"^ 1 ) 2 + (x 2 e- f3x2 ) 2 = J 2 (0, x u (3) + I 2 (0, x 2 , P) 

and we conclude, as in the previous example, that the remaining assumption 
(2.9) of Theorem 3.1 is also satisfied. Therefore, the number of support 
points of the standardized maximin D-optimal design in the exponential 
growth model (3.7) becomes arbitrarily large if /3 max — > 00. 



Example 3.4. To illustrate how the technical difficulties increase with 
the dimension, we consider the exponential growth model with normally 
distributed homoscedastic errors and 

(3.8) i](x,ai,a 2 ,P)=a 1 + a 2 e- l3x , x E [0, 1], (3 E [l,/3 max ]. 
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In particular, we obtain for the determinant in (2.8) 

I 3 (xi,x 2 ,x 3 ,f3) 

+ x 2 e- px2 {e- pxi - e"^ 3 ) + x 3 e~ 13x3 {e~ 13x2 - e"^ 1 )] 2 
= H 2 (x 1 ,x 2 ,x 3 ), 

where the last line defines the function H{x\, x 2 , x 3 ). Han and Chaloner [8] 
showed that local D-optimal designs for the exponential regression model 

(3.8) have three support points and that x = and x = 1 are in fact common 
support points of all local D-optimal designs, that is, m = 3, = 2, m — 

= 1. An alternative derivation of the latter fact will be given below. For 
the points x = and x = 1, the determinant reduces to 

(3.9) 7 3 (0, x, 1, 0) = [xe- px {l - e~P) - e~ p (l - e"^)] 2 = 77 2 (0, x, 1, /?). 

Note that sums of cyclic products such as a(b — c) + b(c — a) + c(a — b) 
vanish. Therefore, we obtain the useful representation 

H(xi,x 2 ,x 3 ,(3) 

P —/3X3 _ P —/3X2 

= i _ e _p [ Xl e-^(1 - e-P) - e- p (l - e"^ 1 )] 



(3.10) + — A t-JL[ X2e -fr*(i _ e -P) _ e -/J(i _ e -^ 2) ] 

1 — e p 



-f3x\ _ p -/3x3 



+ e — — [x 3 e- px3 (l - e-P) - e-P(l - e"^ 3 )] 
1 — e P 



3 



= ^a fe 77(0,x fe ,l,/3), 

k=l 

with coefficients satisfying |aj.| < 1. If the support points are ordered, 
that is, 

(3.11) 0<xi <x 2 <x 3 < 1, 

then ai, a 3 < and a 2 > 0. The estimation of the quantity 73 heavily depends 
on the knowledge of the signs of the function 77. Obviously, 77(0, x,l,f3) van- 
ishes at x = and x = 1. Moreover, the derivative at x = is positive. Since 
exponential sums of the form a\ + (a 2 + a 3 x)e~@ x have at most two real zeros 
(sec [10], page 23), it follows that 77(0, x, > for < x < 1. For the same 
reason, the function x 1— > 77(x, x 2 , x 3 , (3) vanishes only at x = x 2 and x = x 3 . 
Hence, sign77(xi, x 2 , x 3 , f3) = sign 77(0, x 2 , x 3 , f3) if the ordering (3.11) holds. 
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Similarly, it follows that signH(0,X2,X3, (3) = signi7(0, X2, 1, (3) = +1. Re- 
calling the statement on the coefficients a±, a>2 and a 3 in (3.10), we see that 
the summands with k = 1 and k = 3 diminish the sum and it follows that 

h(xi,X2,X3,P) < h(0,x 2 , 1,(3). 

This not only yields the cited result regarding the location of the smallest and 
largest support point of local D-optimal designs, but additionally provides 
the bound 

3 

I 3 (x 1 ,x 2 ,x 3 ,P) < ^/ 3 (0,x fc ,l,/3), 
fc=i 

which holds for support points x\, X2, X3 in any order. In particular, we have 
a bound with a decomposition as stated in (2.9). Therefore, the exponential 
regression model can now be treated by arguments similar to those given in 
the previous examples, although the technical details are more involved. In 
particular, we have from (3.9), for large (3, 

7-3(0,*, 1,(3) < |r 2 e- 2 ^, |M(£[/3],/?)| = l S up/ 3 (0,x, 1,(3) > 

Although there is no simple representation for the determinant |M(£[/3],/3)|, 
the estimates above are comparable to the corresponding equations (3.3) and 
(3.4) for the one-dimensional exponential model considered in Example 3.2 
(the estimates differ only by constants). Thus, we conclude, similarly as in 
the previous examples, that the number of support points of the standardized 
maximin D-optimal design in the model (3.8) is unbounded for sufficiently 
large /3 max . 

Remark 3.5. At first glance, the results of Theorem 3.1 and the ex- 
amples (including those in the technical report of Braess and Dette [1]) are 
surprising because it has never been observed in numerical studies that the 
number of support points of the standardized maximin D-optimal design 
substantially exceeds the number of parameters. To our knowledge, the nu- 
merical results of Example 3.2 are the first ones in this direction. However, it 
follows from the proof of Theorem 3.1 in the Appendix that the construction 
of a design with more than N support points outperforming a given design 
requires a very large parameter space in the maximin D-optimality criterion. 
In particular, the number of support points of the standardized maximin D- 
optimal design may increase so slowly that it is almost impossible to decide 
by means of numerical computation whether it is asymptotically bounded 
or not. Thus, in practice, optimal designs with a large number of support 
points will only be observed if a very large parameter space is involved. 
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Remark 3.6. It was pointed out by a referee that the minimum in the 
standardized maximin criterion is usually calculated over a finite grid in the 
interval [/3 m in 5 /3max]- A careful inspection of the proof of Theorem 3.1 shows 
that the results of this section can be extended to such situations. 



4. Bayesian D-optimal designs. We now turn to analogous questions 
for the Bayesian D-optimality criterion (2.1). When Bayesian D-optimal 
designs are considered, it does not make a difference whether the information 
matrix or its standardized analogue is considered. The difference between 
the criterion 



, , a) = l log WI lW 

(4.1) 



[log|M(e,/?)|-log|M(C[/3],/?)|]7r(d/3) 

B 

and the function in (2.1) is a constant that does not depend on the design 
£. In this case, uncertainty can be directly specified by a prior, which is 
supported on the interval B = [f3 m - m , f3 mRX \ , where — oo < f3 m \ a < /3 max < oo. 
For example, one might increase the support of the prior without changing 
its shape or one could fix the support B of ir and change the shape such that 
its variance increases. The following result covers both cases. 

Theorem 4.1. Assume that (2.9) holds and that Q is uniformly de- 
creasing with respect to the scale £ in the sense of Definition 2.1, where the 
function (p satisfies 

(4.2) ip(z) < ae-W 

for some positive constants c\, 7, and that the prior and the function £ in 
Definition 2.1 satisfy, for all measurable sets B C B= [/3 m in 5 AnaxL 

(4-3) / SrJm < I < d P) 



Is i{B) v ' '-Jb 

for some positive constant C3. Let N G N be given. If £(f3 m ax) — ^(Anin) is 
sufficiently large, then the Bayesian D -optimal design with respect to the 
prior it on the interval B is supported at more than N points. 

Remark 4.2. Note that increasing the interval B in the optimality cri- 
terion (2.1) such that condition (4.3) is satisfied also changes the prior ir 
on B. A typical example is the uniform distribution on the set B for which 
the assumption (4.3) is obviously satisfied if £((3) = j3 or £(/3) = log j3. In this 
case, the shape of the prior does not change, as will be illustrated in Exam- 
ple 4.3. On the other hand, uncertainty can also be quantified by changing 
the shape of the prior and, in this case, the function £ usually changes with 
7r (see Examples 4.4 and 4.5 below). 
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Table 2 

Bayesian D-optimal designs with respect to a uniform 
distribution on the interval [1,B] in the exponential 
regression model (3.2). First row: support points; second 
row: weights 



B 10 40 50 100 200 300 3000 

xi 0.182 0.048 0.038 0.019 0.010 0.006 0.0006 
wi 1.000 0.981 0.973 0.962 0.959 0.957 0.951 



xi 0.354 0.318 0.215 0.134 0.084 0.009 

w 2 0.019 0.027 0.038 0.041 0.037 0.039 

x 3 0.236 0.055 

w 3 0.006 0.006 

x A 1.000 

™ 4 0.004 



Example 4.3. Consider the exponential regression model (3.2) of Ex- 
ample 3.2. Obviously, the function ip in (3.6) also satisfies the stronger as- 
sumptions in Theorem 4.1. As a consequence, the number of support points 
of Bayesian D-optimal designs with respect to a uniform distribution is un- 
bounded if the support of the prior is increased. Table 2 shows the Bayesian 
D-optimal designs corresponding to the situation considered in Table 1. 
Note that the standardized maximin D-optimal designs have remarkably 
more support points than the Bayesian D-optimal designs with respect to 
the uniform prior. 

Example 4.4. Consider the logistic regression model Y ~ Bin(l, rj(x, (5)) 
with 

(4.4) r? ( X)/ 5) = _L_ ) xe [0,oo),/?e [0,oo), 

where the Fisher information of the parameter j3 at point x is given by 

I ( x >0) = (i + e *-/J)2- 

For any a £ (0, 1), we consider the prior 

Tr a (dp) = cae^I [0A/a) (P)df3, 

where c= (1 — e ) . Note that the expectation value and the variance of 
ir a are proportional to l/o and 1/a 2 , respectively. If we define 

£ a (df3) = ca 1/2 e- aP I [0A/a) ((3)d(3, 
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then £ a (B) = a -1 / 2 — ► oo if a — > 0, and a straightforward calculation shows 
that condition (4.3) holds with C3 = 1. 

The local D-optimal design is a one-point design concentrating its mass 
at the point x[(3] =(3 with M(£\fl],/3) = 1/4. Hence, 

Q(j3 i P) = —?L-_ <4e^l. 

V ' (l + e /3-/3)2- 

An application of the mean value theorem shows that condition (2.4) is 
satisfied with cp{z) = Ae~^ if a < c~ 2 . Moreover, Q{pJ) > \ if \l a (p) - 
£ a 0)\ < a 1 / 2 and Theorem 4.1 applies. If a— ► 0, the quantity 4(#) = a" 1 / 2 
is sufficiently large and the number of support points of the Bayesian D- 
optimal design with respect to the prior -rr a in the logistic regression model 
(4.4) exceeds any given bound N € N. 



Example 4.5. As pointed out by a referee, it is worthwhile to mention 
that Theorem 4.1 also applies to discrete priors (where its proof has to be 
slightly modified). Consider, for example, the logistic regression model (4.4) 
and a uniform prior ttl on the set Ml = {1 ; • • • If £ is the distribution 
function of the discrete measure with mass 1 at each element of Ml, we 
have £(B) = L and there is equality in (4.3) with C3 = 1 (note that £ is a 
step function with jumps of size 1 at each element of Ml)- Consequently, 
we have, for all f3,f3 G supp(7r), 

4 e /3-/3 4 e -i 1 

if \£(/3) - £0)\ < 1. Moreover, Q((3,/3) < 4ee~l L^J — L^J I and it follows from 
Theorem 4.1 that the number of support points of the Bayesian D-optimal 
design with respect to the prior ttl becomes arbitrarily large as L — > 00. 
We finally mention a consequence of Caratheodory's theorem. For a discrete 
prior with L support points, there exists a Bayesian D-optimal design with 
at most Lm(m + 1)/2 support points; see [12]. This bound reduces to L and 
converges to 00 in the present case. 



Remark 4.6. Note that Theorem 4.1 requires stronger decay of the 
function 99 than Theorem 3.1. As a consequence, the number of support 
points of Bayesian D-optimal designs usually increases more slowly with 
the length of the parameter space compared to the maximin case. We have 
illustrated this fact in Examples 3.2 and 4.3, where we compared the stan- 
dardized maximin and the Bayesian D-optimal design with respect to the 
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uniform distribution. On the other hand, it follows from the proof of The- 
orem 4.1 in the Appendix that a similar result holds for the Bayesian A- 
optimality criterion 



where condition (4.2) can be replaced by the weaker condition (3.1). This 
explains the empirical results of [2] that Bayesian D-optimal designs usually 
have more support points than Bayesian A-optimal designs. 

5. Conclusions. When efficient designs in nonlinear regression models 
are constructed, it has been observed numerically by many authors that 
the number of support points of Bayesian and maximin Z)-optimal designs 
increases with the amount of uncertainty about a priori knowledge of the 
location of the nonlinear parameters. In this paper, we have established suf- 
ficient conditions under which the number of support points of Bayesian 
and maximin D-optimal designs can become arbitrarily large if the prior 
information on the unknown nonlinear parameters is diminished. The es- 
sential condition is the decay of the efficiency for large deviations between 
the specified and "true" parameter. The conditions apply to many of the 
commonly used regression models. In fact, we did not find any model where 
these conditions are not satisfied. 

For the sake of brevity and a clear presentation, we have restricted our 
investigations to nonlinear models where one parameter appears nonlinearly 
in the Fisher information. However, our approach can also be applied to 
models with more nonlinear parameters, although some of the arguments 
have to be adapted. The main idea is to introduce an appropriate norm 
for high-dimensional nonlinear parameters which generalizes the distance 
(2.6). These arguments are outlined in Appendix A. 3. A similar result is 
also available for the Bayesian .D-optimality criterion by combining this 
argument with the results of Section 4. Moreover, Theorems 3.1 and 4.1 can 
also be extended to nonrectangular regions. 

In this paper, we have made a general statement on the structure of 
optimal designs with respect to the standardized maximin and Bayesian D- 
optimality criteria, which is important for a better understanding of these 
sophisticated optimality criteria. For a given model of interest, our method- 
ology can be used to prove a phenomenon which was conjectured for a long 
time in the literature. In all examples that we have investigated, the devel- 
oped theory was applicable and we were able to prove that the number of 
support points of the standardized maximin and Bayesian D-optimal designs 
exceeds any given bound if the knowledge about the underlying parameter 
space is diminished. Moreover, we have also provided some explanation as 
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to why standardized maximin D- and Bayesian A-optimal designs usually 
have more support points than Bayesian .D-optimal designs. 

We finally mention that the results for the Bayesian D-optimality criterion 
will have applications for estimating mixture distributions. To be precise, it 
was pointed out by Lindsay [14] that the determination of the ML-estimate 
of a mixture distribution corresponds to a Bayesian D-optimal design prob- 
lem in a one-parameter nonlinear model. It therefore follows from the results 
of the present paper that in many models, the number of components of the 
estimated mixture distribution increases with the sample size. 

APPENDIX: PROOFS 

A.l. Proof of Theorem 3.1. The proof consists of two steps. Set B = 
^(/Smax) — £(Pmin)- First we show that, for an arbitrary design, say £tv, with 
N support points, it follows that 

(A.l) $(6v) = min{ ^^J^ 1 |/3 G [/? min ,/3 m ax]} < d,(N + l)B~\ 

where d\ is a positive constant not depending on B and 7 > m. Second we 
show that there exists a design £ n (with at least n support points) on X 
such that 

( A - 2 ) *(*») > ^ 

for some positive constant cfo not depending on B. Since 7 > m, given N, 
we have 

d 1 (jv + i)s-7<A 

if B is sufficiently large, and the optimal design is supported at more than N 
points in this case. This proves the assertion. For the sake of a transparent 
representation, we begin with a proof of the estimates (A.l) and (A. 2) in 
the case m = 1. The general case will be treated in a second step (B), while 
we prove in part (C) the remaining assertion of Theorem 3.1, considering 
the case where the local D-optimal designs are minimally supported. 



(A) The case m = 1: To verify the estimate (A.l), let £at = J2k=i w k^. 



denote any design with mass Wk at the point Xk (k = 1, ... , N). Here, S Xk 
denotes the Dirac measure at the point x^. Then 



N 

M{t N) (3) = Y J WkI{x kl P). 

k=l 



By assumption (2.9), there exist real numbers /3 m i n < Pi < ■ ■ ■ < An < Pn 
such that the inequality 

N N 

(A.3) M(Cn,P) <J2 w * M tilfa],P) = M(t[P],P)Y, w kQ(P>fa) 



k=l k=l 
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holds for all (3 € B. For convenience, we put (3q = /3 m i n , Pn+i = /?max- Now, 
at least one gap between the numbers £(flk) must be large. Specifically, there 
exists an index j G {0, . . . , N} such that 

(A.4) £(f3 j+1 ) - > j^-j. 

We consider the inequality at the point j3 defined by £(/3) = \[£{l3j) +£((3j + \)] 
and derive from (A.4), 

(a.5) \m - > \wj + i)-i{Pj)) > 



2 v v j-r±, v j„ - 2 (jV+ 1) 

for all k G {0, 1, 2, . . . , N + 1}. We now use inequality (A. 3) and the definition 
of Q in (2.3), and obtain from assumption (2.4), (3.1) and (A.5), 

N 

fc=i 

N 



(A.6) 



fc=i 

B \-r 



ci(2N + 2)~< 



-M(£[/3],/3) 

for some positive constant c\. We set d\ = c\{2N + 2) 7 and the proof of 
(A.l) is complete. 

For proof of the lower bound (A. 2), we may restrict ourselves to the case 
B > 4A, where A is the constant defined in Definition 2.1(ii). We choose 
n = [~ii?/A] < B/X and fix j3k such that 

(A.7) l(f3k)=t(Pmm) + (2k-l)^- (fc = l,2,...,n). 

Note that these points are contained in the interval [/9 m in, /3 m ax] and that 
their distance is at most 2A. Let £[/3fc] , k = 1, . . . , n, denote the corresponding 
local D-optimal designs and define 

1 n 

(A.8) & = 

The design £ n has at least n support points and its information matrix 
satisfies 

1 n 

(A.9) M(£ n ,/?) = -£M(£[/3 fc ],/?). 



SUPPORT POINTS OF MAXIMIN AND BAYESIAN DESIGNS 



17 



Obviously, given (3 £ [/3 m i n , /3 m ax], there exists an index j = jp such that 

\l{P)-l{Pj)\<\. 

By construction, Af (£[/?,•], (3) = Q{/3, f3 j )M(£{(3], (3) > ±M(£[/3],/?). Since all 
terms in the sum (A. 9) are nonnegative, it follows that for all (3 £ B, 

M(H n ,(3) > -M^\J3j],0) > ±-M{m,P) > ^M(d[f3],f3)- 
n in d 

Recalling the definition of the standardized maximin criterion in (2.2), we 

conclude that $(£n) > X/2B. With the choice d<i = A/2, we have proven the 

lower bound (A. 2), and the proof in the case m = 1 is complete. 

(B) The case m>l: Let £jv be a design with masses Wk at the points 

Xk £ X (k = 1, . . . , iV), and for any tuple (ii, . . . , z m ) with 1 < i\ < ■ • ■ < 

Vi < iV, let ^[/3i 1 1 ) ) ... ) i m ],.-.,ClA^!..,i m ] denote the designs corresponding to 
the points x^, . . . ,Xi m by inequality (2.9). Using the definition of Q in (2.3) 
and the Cauchy-Binet formula, we obtain 

\M(£ N) 0)\= Y w il ---Wi m I m {x il ,...,Xi m ,/3) 

l<ii<-<i m <N 

m 

(A.10) <c E ^•••^El M (^l.,iJ'/ 3 )l 

l<il<-<i m <JV 3=1 
m 

= co\M(mM Y E ■■m m Q(P,tit,iJ- 

3=1 l<h<-<i m <N 

Note that there are rn^) terms in this sum and that inequality (A. 10) 

corresponds to (A. 3) in the proof of the case m = 1. Therefore, ordering the 

(i) 

points fly im and using exactly the same arguments as in the proof of part 
(A) yields the upper bound &(£,n) < d\B~^ for some constant d\ and 7 > m. 

In order to prove the corresponding lower bound, we define n=\\B/X\ 
and again consider the quantities fa defined by (A. 7). Let denote the 
corresponding local D-optimal design, define the design £ n by (A. 8) and 
denote by Xi and Wi the corresponding support points and weights of £ n , 
respectively. For any f3, there exists a (3j such that \t(J3~) — £(f3j) \ < A and we 
denote the support points and weights of the corresponding local D-optimal 
design £[/3j] by Xj and Wi, respectively. By the Cauchy-Binet formula, we 
obtain 

\M(£ n ,P)\= Y w il ---w irn I m (x h ,...,x im ,P) 

il<-<im 

(A.ll) >— Y w h ---w irn I m (x ll ,...,x im ,P) 

ll<—<tm 
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where the last inequality follows by omitting all terms containing points 
which are not in the support of the local D-optimal design Using 
assumption (2.5), we therefore obtain 

(A.i2) |M(f„,/3)i > ^wimMQWi) > ^\ M m,p)\ 

and the same argument as presented in the proof for the case m = 1 shows the 
lower bound 3>(£ n ) > cdilB m for some positive constant c?2- The assertion 
in the case m > 1 now follows by the same arguments as given in the first 
part of the proof. 

(C) Proof of the remaining assertion. If the local D-optimal designs are 
supported at m points, the corresponding weights all equal 1/m (see [18]) 
and the estimate in (A. 11) can be improved as follows. Let x\, ... , x m71 denote 
the points of the design £ n , which are support points of any local D-optimal 
design, and define w\, . . . ,w m to be the corresponding weights. Note that 
Wi = l/m, i = 1, . . . , m^; then 

\M(U,P)\ 

> w 1 ---w imv w imv+1 ---w im I m (x 1 ,...,x mv ,x im ^ +1 ,...,x im ,(3) 

Here, the first inequality follows by considering only the terms for which 
I m contains the common support points x±,...,x m , while the second in- 
equality is obtained by considering only the term corresponding to the local 
L>-optimal design The same argument as used for (A. 12) now shows 

that 3>(£ n ) ^ d2/B m ~ m71 for some positive constant d%, and the assertion 
now follows, as explained in the first part of the proof. 

A. 2. Proof of Theorem 4.1. We restrict ourselves to the case m = 1. 
The case m > 1 can be obtained by adapting arguments from part (B) in 
the proof of Theorem 3.1. 

Moreover, for convenience, in a first step, we assume that the given 
transformation i is the identity and define B = i(B) = /9 max — Anm- For 
a given design £tv with support points, we know that inequality (A. 3) 
holds. With assumption (4.3) and the notation /3q = /3 m i n , f3jy+i = Anax and 
Aj = — /3j), we estimate the contribution of the interval [f3j,(3j + Aj] 

to the Bayesian D-optimality criterion via 
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C3 /•', ^- " 
-~5 L lo s 2^ W k 



\M(t[Pk],t)\ 



dt 



^i/ log 53 wfc^-^dt 

log(ci exp(— |i — /3j| 7 )) (it 



fe=l 
A? 



< 



£3 

5 



o 



[log ci — z 7 ] 



C3 



Ajlogci 



— A}+ 7 
1 + 7 J 



The same bound is derived for the interval [/3j + A.j,/3j+i]. Summing over 
all intervals of this form, we conclude that 



9 N r 



3=0 



Aologci 



— A 1+7 
1 + 7 3 



Since J2f=o A7 = if anc l the function is strictly convex, the right-hand 
side attains its maximum if all A^'s are equal, and we obtain the upper 
bound 

B"< 

logci 



(A.13) 



*st(&v) <C4 



(1 + 7). 

for some positive constant C4. Note that the right-hand side of this inequality 
is dominated by the term with 2? 7 when B — > 00. 

The construction of a better design with respect to the Bayesian optimal- 
ity criterion follows the arguments given in part (A) of the proof of Theorem 
3.1. Let A > be defined by 

Q((3, 0)>\ whenever \0 - f}\ < A. 

Set n = [~t||] and f3j = /? m i n + (2j — 1)A for k = 1, 2, . . . , n. We choose a design 
£ with (at least) n support points such that 

1 n 

[see the identity in (A. 9)]. For any given f3 G [/? m in 5 /5 m ax], there exists a (3j 
with \ j3 — j3j \ < A satisfying (2.5). Therefore, it follows for all j3 € B that 

M(£,/3) > -M(e[^-],/3) = -Q{p,Pi)M{£\0[,P) > ^M(e[/3],/3). 



Hence, 



/ log— 7r(<Z/3)=log— >-logB + l 0g A. 

Jb In In 



20 



D. BRAESS AND H. DETTE 



This value is larger than the upper bound (A. 13) if B is sufficiently large. 
Therefore, a design with N support points cannot be optimal if B is suf- 
ficiently large. Thus far, we have restricted ourselves to the case £(f3) = (3. 
The general case proceeds in exactly the same way, where one must choose 
dt = £(df3) /£(B) in the first integral of the proof, and the boundaries of the 
intervals must be adapted. The details are left to the reader. 

A. 3. A comment on more "nonlinear" parameters. In this paragraph, 
we briefly describe how the arguments need to be changed if there exist 
p > 1 nonlinear parameters, say (3 = (/3\, .. .,J3 P ), which appear nonlinear ly 
in the Fisher information matrix. For the sake of brevity, we consider the 
standardized maximin criterion with a p-dimensional cube B. First, note 
that condition (2.9) does not depend on the dimension of the parameter j3. 
Second, let d denote a norm on W and replace conditions (2.4) and (2.5) by 

(A.14) Q{p,$)<<p{d{pM 

and 

(A.15) Q((3, 0)>l whenever d(J3, P) < A, 

respectively. We then show that the number of support points of the stan- 
dardized maximin D-optimal design becomes arbitrarily large if the volume 
of the cube B converges to infinity. In other words, Theorem 3.1 also holds 
in the case where the p > 1 parameters appear nonlinearly in the Fisher 
information. 

For this, we note that the proof of Theorem 3.1 is performed by estab- 
lishing the bounds (A.l) and (A.2). Let £/v denote the design considered in 
(A. 10) and note that the estimate (A. 10) does not depend on the dimension 
of (5. For r > 0, define the ball with center (5 and radius r by 

U r (j,h,. ..,i m ) = {xe MP\d(x,(3%l^J < r}. 

There exists a minimal r m ; n such that B can be covered by balls of this type, 
that is, 

m 

Bd{j |J U rmia (j,i i m ). 

j=lil<— <i m 

Obviously, we have, for some constant c > 0, that r m i n > cB, where B de- 
notes the pth. root of the volume of B. Consequently, there exists a/JfB 
such that 

d{^^l..., im )>r^j2>cB. 

Thus, replacing condition (2.4) by (A.14) in the argument (A. 6) yields the 
upper bound (A.l) for any iV-point design. The remaining inequality (A.2) 
is similarly obtained by covering B with balls of radius A. 
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